The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Image-text retrieval in remote sensing aims to provide flexible information for data analysis and application. In recent years, state-of-the-art methods are dedicated to ``scale decoupling'' and ``semantic decoupling'' strategies to further enhance the capability of representation. However, these previous approaches focus on either the disentangling scale or semantics but ignore merging these two ideas in a union model, which extremely limits the performance of cross-modal retrieval models. To address these issues, we propose a novel Scale-Semantic Joint Decoupling Network (SSJDN) for remote sensing image-text retrieval. Specifically, we design the Bidirectional Scale Decoupling (BSD) module, which exploits Salience Feature Extraction (SFE) and Salience-Guided Suppression (SGS) units to adaptively extract potential features and suppress cumbersome features at other scales in a bidirectional pattern to yield different scale clues. Besides, we design the Label-supervised Semantic Decoupling (LSD) module by leveraging the category semantic labels as prior knowledge to supervise images and texts probing significant semantic-related information. Finally, we design a Semantic-guided Triple Loss (STL), which adaptively generates a constant to adjust the loss function to improve the probability of matching the same semantic image and text and shorten the convergence time of the retrieval model. Our proposed SSJDN outperforms state-of-the-art approaches in numerical experiments conducted on four benchmark remote sensing datasets.
translated by 谷歌翻译
无监督的视频域适应是一项实用但具有挑战性的任务。在这项工作中,我们第一次从脱离视图中解决了它。我们的关键想法是在适应过程中将与域相关的信息从数据中删除。具体而言,我们考虑从两组潜在因素中生成跨域视频,一个编码静态域相关信息,另一个编码时间和语义相关的信息。然后开发转移顺序的VAE(Transvae)框架以建模这种产生。为了更好地适应适应,我们进一步提出了几个目标,以限制Transvae中的潜在因素。与几种最先进的方法相比,对UCF-HMDB,小丑和Epic-Kitchens数据集进行了广泛的实验验证了Transvae的有效性和优势。代码可在https://github.com/ldkong1205/transvae上公开获取。
translated by 谷歌翻译
3D场景感性风格化旨在根据给定的样式图像从任意新颖的视图中生成光真逼真的图像,同时在从不同观点呈现时确保一致性。一些带有神经辐射场的现有风格化方法可以通过将样式图像的特征与多视图图像结合到训练3D场景来有效地预测风格化的场景。但是,这些方法生成了包含令人反感的伪影的新型视图图像。此外,他们无法为3D场景实现普遍的影迷风格化。因此,样式图像必须根据神经辐射场重新训练3D场景表示网络。我们提出了一个新颖的3D场景,逼真的风格转移框架来解决这些问题。它可以通过2D样式图像实现感性3D场景样式转移。我们首先预先训练了2D逼真的样式传输网络,该网络可以符合任何给定内容图像和样式图像之间的影片风格转移。然后,我们使用体素特征来优化3D场景并获得场景的几何表示。最后,我们共同优化了一个超级网络,以实现场景的逼真风格传输的任意样式图像。在转移阶段,我们使用预先训练的2D影视网络来限制3D场景中不同视图和不同样式图像的感性风格。实验结果表明,我们的方法不仅实现了任意样式图像的3D影像风格转移,而且还优于视觉质量和一致性方面的现有方法。项目页面:https://semchan.github.io/upst_nerf。
translated by 谷歌翻译
深度学习方法论为高光谱图像(HSI)分析社区的发展做出了很大贡献。但是,这也使HSI分析系统容易受到对抗攻击的影响。为此,我们在本文中提出了一个掩盖的空间光谱自动编码器(MSSA),根据自我监督的学习理论,以增强HSI分析系统的鲁棒性。首先,进行了一个掩盖的序列注意学习模块,以促进沿光谱通道的HSI分析系统的固有鲁棒性。然后,我们开发了一个具有可学习的图形结构的图形卷积网络,以建立全局像素的组合。这样,每种组合中的所有相关像素都可以分散攻击效果,并且在空间方面可以实现更好的防御性能。最后,为了提高防御能力并解决有限标记样品的问题,MSSA采用光谱重建作为借口任务,并以自我监督的方式适合数据集。 - 高光谱分类方法和代表性的对抗防御策略。
translated by 谷歌翻译
二进制神经网络(BNN)提供了一种有希望的解决方案,可以将参数密集的深度单像超分辨率(SISR)模型部署到具有有限的存储和计算资源的真实设备上。为了实现与完整精确的对应物的可比性能,大多数现有的SISR现有BNN主要集中于补偿通过更好地近似于二进制卷积,将网络中的权重和网络激活产生的信息损失。在这项研究中,我们重新审视了BNN及其全精度对应物之间的差异,并认为BNN的良好概括性能的关键在于保留完整的完整过度信息流以及通过每个二进制卷积层经过的准确梯度流量。受此启发的启发,我们建议在整个网络上引入完整的跳过连接或其在每个二元卷积层上的变体,这可以提高正向表达能力和背部传播梯度的准确性,从而提高概括性能。更重要的是,此类方案适用于任何现有的BNN骨架用于SISR,而无需引入任何其他计算成本。为了证明其功效,我们使用四个基准数据集中使用SISR的四个不同的骨干对其进行评估,并报告明显优于现有BNN甚至一些4位竞争对手。
translated by 谷歌翻译
从磁共振成像(MRI)中进行精确的脑肿瘤分割,对于多模式图像的联合学习是可取的。但是,在临床实践中,并非总是有可能获得一组完整的MRI,而缺失模态的问题会导致现有的多模式分割方法中的严重性能降解。在这项工作中,我们提出了第一次尝试利用变压器进行多模式脑肿瘤分割的尝试,该脑肿瘤分割对任何可用模式的任何组合子集都是可靠的。具体而言,我们提出了一种新型的多模式医疗变压器(MMMFORMER),用于不完整的多模式学习,具有三个主要成分:混合模态特异性的编码器,该编码器在每种模式中桥接卷积编码器和一个局部和全局上下文模型的模式内变压器;一种模式间变压器,用于建立和对齐模态跨模态的远程相关性,以对应于肿瘤区域的全局语义。一个解码器,与模态不变特征进行渐进的上采样和融合,以生成可靠的分割。此外,在编码器和解码器中都引入了辅助正规化器,以进一步增强模型对不完整方式的鲁棒性。我们对公共批评的大量实验$ 2018 $ $数据集用于脑肿瘤细分。结果表明,所提出的MMFORMER优于几乎所有不完整模态的亚群的多模式脑肿瘤分割的最新方法,尤其是在肿瘤分割的平均骰子中平均提高了19.07%,只有一种可用的模式。该代码可在https://github.com/yaozhang93/mmmenforer上找到。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在不同的现实应用中表现出卓越的性能。为了提高模型容量,除了设计聚合运作,GNN拓扑设计也非常重要。一般来说,有两个主流GNN拓扑设计方式。第一个是堆叠聚合操作以获得更高级别的功能,但随着网络更深的方式,易于进行性能下降。其次,在每个层中使用多聚合操作,该层在本地邻居提供足够和独立的特征提取阶段,同时获得更高级别的信息昂贵。为了享受减轻这两个方式的相应缺陷的同时享受福利,我们学会在一个新颖的特征融合透视中设计GNN的拓扑,这些融合透视中被称为F $ ^ 2 $ GNN。具体而言,我们在设计GNN拓扑中提供了一个特征融合视角,提出了一种新颖的框架,以统一现有的拓扑设计,具有特征选择和融合策略。然后,我们在统一框架之上开发一个神经结构搜索方法,该方法包含在搜索空间中的一组选择和融合操作以及改进的可微分搜索算法。八个现实数据集的性能增益展示了F $ ^ 2 $ GNN的有效性。我们进一步开展实验,以证明F $ ^ 2 $ GNN可以通过自适应使用不同程度的特征来缓解现有GNN拓扑设计方式的缺陷,同时提高模型容量,同时减轻了现有的GNN拓扑设计方式的缺陷,特别是缓解过平滑问题。
translated by 谷歌翻译
近年来,图形神经网络(GNNS)在现实世界数据集上对不同应用的不同应用表现出卓越的性能。为了提高模型能力并减轻过平滑问题,提出了几种方法通过层面连接来掺入中间层。但是,由于具有高度多样化的图形类型,现有方法的性能因不同的图形而异,导致需要数据特定的层面连接方法。为了解决这个问题,我们提出了一种基于神经结构搜索(NAS)的新颖框架LLC(学习层面连接),以学习GNN中中间层之间的自适应连接。 LLC包含一个新颖的搜索空间,由3种类型的块和学习连接以及一个可分辨率搜索过程组成,以实现有效的搜索过程。对五个现实数据集进行了广泛的实验,结果表明,搜索的层面连接不仅可以提高性能,而且还可以缓解过平滑的问题。
translated by 谷歌翻译
Temperature monitoring during the life time of heat source components in engineering systems becomes essential to guarantee the normal work and the working life of these components. However, prior methods, which mainly use the interpolate estimation to reconstruct the temperature field from limited monitoring points, require large amounts of temperature tensors for an accurate estimation. This may decrease the availability and reliability of the system and sharply increase the monitoring cost. To solve this problem, this work develops a novel physics-informed deep reversible regression models for temperature field reconstruction of heat-source systems (TFR-HSS), which can better reconstruct the temperature field with limited monitoring points unsupervisedly. First, we define the TFR-HSS task mathematically, and numerically model the task, and hence transform the task as an image-to-image regression problem. Then this work develops the deep reversible regression model which can better learn the physical information, especially over the boundary. Finally, considering the physical characteristics of heat conduction as well as the boundary conditions, this work proposes the physics-informed reconstruction loss including four training losses and jointly learns the deep surrogate model with these losses unsupervisedly. Experimental studies have conducted over typical two-dimensional heat-source systems to demonstrate the effectiveness of the proposed method.
translated by 谷歌翻译